Rupaul’s Drag Race is a reality competition show for drag queens in which contestants compete to be America’s next Drag Super Star. Amongst the many topics of debate surrounding the show is whether it has a bias towards young queens.
This project visualises the spread of the ages of Drag Race contestants by season, as well as the ages of the queens that won each season. This will allow a general picture of the age range of the queens as well as the identification of whether there is a historical trend in it.
This data is provided as part of the package dragracer1. The CRAN archive for the package is https://cran.r-project.org/package=dragracer. The github page is https://github.com/svmiller/dragracer
A description of the project is available on the creator, Steven V. Miller’s blog post: “An Empirical Analysis of RuPaul’s Drag Race Contestants.” The data was compiled by scraping the Rupaul’s Drag Race Wiki
Because it is a package, the first time you run this script, you must
run install.packages(dragracer) first.
The dragracer package includes three datasets: rpdr_contestants, which provides information about the contestants, rpdr_contep, which describes each contestant’s performance in each episode, and rpdr_ep, which gives characteristics of each episode. This analysis and visualisation use the rdpr_contestants and rpdr_contep datasets.
The relevant datasets have been exported to csv files using this
code:
write.csv(df, "filepath\\filename.csv", row.names = FALSE).
They are available in the data folder.
#loading required packages with pacman package manager
library(pacman)
pacman::p_load(ggplot2, plotly, magrittr, tidyverse)
library(dragracer) #when loaded returns a catchphrase from the show, omitted from pacman for flair
## The library is now open
#previewing datasets
head(rpdr_contestants)
## # A tibble: 6 x 5
## season contestant age dob hometown
## <chr> <chr> <dbl> <date> <chr>
## 1 S01 BeBe Zahara Benet 28 1981-03-20 Minneapolis, Minnesota
## 2 S01 Nina Flowers 34 1974-02-22 Bayamón, Puerto Rico
## 3 S01 Rebecca Glasscock 26 1983-05-25 Fort Lauderdale, Florida
## 4 S01 Shannel 26 1979-07-03 Las Vegas, Nevada
## 5 S01 Ongina 26 1982-01-06 Los Angeles, California
## 6 S01 Jade 32 1984-11-18 Chicago, Illinois
head(rpdr_contep)
## # A tibble: 6 x 11
## season rank missc contestant episode outcome eliminated participant minichalw
## <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 S01 1 0 BeBe Zaha~ 1 SAFE 0 1 0
## 2 S01 2 1 Nina Flow~ 1 WIN 0 1 0
## 3 S01 3 0 Rebecca G~ 1 LOW 0 1 0
## 4 S01 4 0 Shannel 1 SAFE 0 1 0
## 5 S01 5 0 Ongina 1 HIGH 0 1 0
## 6 S01 6 0 Jade 1 SAFE 0 1 0
## # ... with 2 more variables: finale <dbl>, penultimate <dbl>
#create a data frame including the contestant's ages, season, and final ranking in the competition
combined <- left_join(rpdr_contestants, rpdr_contep)
## Joining, by = c("season", "contestant")
ranked <- combined %>%
select(c("season", "contestant","rank", "age")) %>%
distinct()
#create a data frame of winners only
winners <- subset(ranked, rank == "1")
#preview data ranked data frame
head(ranked)
## # A tibble: 6 x 4
## season contestant rank age
## <chr> <chr> <dbl> <dbl>
## 1 S01 BeBe Zahara Benet 1 28
## 2 S01 Nina Flowers 2 34
## 3 S01 Rebecca Glasscock 3 26
## 4 S01 Shannel 4 26
## 5 S01 Ongina 5 26
## 6 S01 Jade 6 32
# summary statistics
stats <- tapply(ranked$age, ranked$season, summary)
head(stats, 3)
## $S01
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 26 26 32 31 34 39
##
## $S02
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.00 25.00 27.00 27.58 29.25 37.00
##
## $S03
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 23.00 26.00 28.00 28.23 29.00 36.00
# violin and scatterplots of all contestants' ages with crossbar of mean
age_all <- ggplot(ranked, aes(x = factor(season), y = age)) +
geom_violin(aes(color = season, fill = season), alpha = 0.5) +
stat_summary(fun = "mean",
geom = "crossbar",
width = 0.3) +
geom_point(aes(color = season, fill = season),
position = position_jitterdodge(jitter.width = .1, dodge.width = 0),
shape=21, size = 1) +
# graph aesthetics
labs(x = element_blank()) +
ggtitle("Rupaul's Drag Race Contestant Ages") +
theme(legend.position="none",
strip.background=element_blank(),
axis.text.x=element_text(size=10,color="black"),
axis.text.y=element_text(size=10,color="black"),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(size= .75,fill = NA,color = "black"),
plot.margin = unit(c(.5, 0.5, .5, .5), "cm"))
#adding interactive overlay of winners' ages - hover over black dot
age_plot <- ggplotly(age_all + geom_point(data = winners, aes(x = factor(season))))
age_plot <- age_plot %>%
style(hoverinfo = "skip", traces = 1:40) %>%
style(text = paste("Winner's age:", winners$age))
age_plot
This visualisation shows the distribution of ages of Drag Race contestants by season, including points for individual contestants’ ages, a crossbar showing the mean age of for that season, and a point showing the age of the winner, which displays the contestant’s age upon hovering.
Drag Race contestants have ranged in age from 21 to 52. The average age of all participants through season 13 is 29. Winners average age is 27. There does not seem to be a significant trend in age across seasons. The relatively young average age of all contestants and winners upholds the commonly accepted belief that drag is a young girl’s game. As beloved Drag Race contestant Katya Zamolodchkava has said, “There’s nothing more depressing than an older drag queen that doesn’t want to do it.”(2)
In the future, I would like to analyse and visualise this data set
further to tackle additional debates in the fandom. One visualisation
I’d like to do would be of the contestants’ hometowns. Queens come from
all over the country, but there are also biases towards New York and
L.A., with New York queens being particularly vocal about the excellence
and dominance of their scene. Additionally, Puerto Rico features
prominently for consistently fielding at least one queen each season.
Thus, I would be interested in generating a map, but for this project, I
could not find an aesthetically pleasing and practical US grid map. I
was excited about applying a specific bit of knowledge from class in
using the code
separate(rpdr_contestants, col = hometown, c("town", "state"), sep = ",")
to divide data combined into one column, ie hometown, into separate
columns, ie, town and state.
References